整数线性编程(ILP)提供了一种可行的机制,可以用自然语言编码有关可解释的多跳推断的明确和可控制的假设。但是,ILP公式是不可差异的,不能集成到更广泛的深度学习体系结构中。最近,Thayaparan等人。 (2021a)提出了一种新的方法,将ILP与变压器整合在一起,以实现复杂多跳推断的端到端的可不同性。尽管已证明该混合动力框架可以提供更好的答案和解释选择,而不是基于变压器和现有的ILP求解器,但神经符号的整合仍然依赖于ILP配方的凸松弛,这可以产生亚最佳溶液。为了改善这些局限性,我们提出了DIFF-BOMP解释器,这是一种基于可区分的黑框组合求解器(DBCS)的新型神经符号结构(Pogan \ V {C} I \'C等,2019)。与现有的可区分求解器不同,提出的模型不需要对明确的语义约束的转换和放松,从而可以直接,更有效地整合ILP公式。 DIFF-COMBLEXER证明了与非差异性求解器,变压器和现有的基于可区分约束的多跳推理框架相比的准确性和解释性的提高。
translated by 谷歌翻译
为了解释神经NLI模型及其推理策略,我们进行了一个系统的探测研究,调查了这些模型是否捕获了自然逻辑的至关重要:单调性和概念包容性。在向下单调上下文中正确识别有效推论是NLI性能的已知绊脚石,包括否定范围和广义量子等语言现象。要了解这种困难,我们将单调性强调为上下文的属性,并检查模型在中文嵌入中捕获单调信息的程度,这些嵌入式是其决策过程的中间嵌入。绘制最近探测范式的进步,我们比较各种模型的单调性功能的存在。我们发现,单调信息在基准测试中实现高分的流行NLI模型的表现中,并观察到基于微调策略的这些模型的改进引入了更强大的单调性功能,以及他们在挑战集上的提高性能。
translated by 谷歌翻译
已经提出了在科学域中再生自然语言解释作为评估复杂的多跳和可解释的推理的基准。在这种情况下,当使用作为跨编码器架构并进行微调的解释时,大型语言模型可以实现最先进的性能。然而,虽然对解释的质量很多,但有效地研究了推理的问题在很大程度上。事实上,交叉编码器本质上不是可扩展的,对需要推断的大规模事实库的实际情况具有有限的适用性。为了在规模上实现复杂的多跳推理,本文重点介绍了双编码器架构,调查了密集和稀疏模型交叉口的科学解释再生问题。具体地,我们呈现瘢痕(用于可扩展的自回归推断),一种混合​​框架,其迭代地结合了基于变压器的双编码器,其具有稀疏模型的解释性模型,旨在利用说明中的显式推理模式。我们的实验表明,混合框架显着优于先前的稀疏模型,实现了与最先进的交叉编码器相当的性能,同时大约为数百万个事实的Corpora的速度快50倍和可扩展。进一步分析了语义漂移和多跳问题的回答,揭示了所提出的杂交提高了最具挑战性解释的质量,有助于提高下游推理任务的性能。
translated by 谷歌翻译
本文介绍了DIFF解释器,这是可解释的多跳推断的第一个混合框架,该框架通过可区分的凸优化将明确的约束与神经体系结构集成在一起。具体而言,DIFF解释器允许在受限的优化框架内微调神经表示,以回答和解释自然语言的多跳问题。为了证明混合框架的功效,我们将现有的基于ILP的求解器与基于变压器的表示相结合。对科学和常识性质量检查任务的广泛经验评估表明,在端到端可区分框架中明确约束的整合可以显着改善非不同可差异ILP求解器的性能(8.91%-13.3%)。此外,其他分析表明,与独立的变压器和以前的多跳方法相比,DIFF解释器能够实现强大的性能,同时仍提供结构化解释以支持其预测。
translated by 谷歌翻译
讨论的现有账户强调了事先经验在解决新问题方面的作用。然而,大多数用于多跳文本推理的当代模型构建解释,考虑每个测试用例的隔离。众所周知,这种范式遭受语义漂移,这导致伪装解释的构建导致错误的结论。相比之下,我们研究了解释的多跳推断的绑架框架,该框架采用了在基于案例的推理中主要研究的检索重新使用修正范例。具体地,我们通过检索和调整来自类似训练示例的先前自然语言解释,提出了一种地址和解释了不均义推理问题的新颖框架。我们在下游致辞和科学推理任务上统一地评估了基于案例的绑架框架。我们的实验表明,与现有可说明的方法相比,所提出的框架可以有效地与稀疏和密集的预训练编码机制或下游变压器集成。此外,我们研究了检索重新使用 - 修改范例对可解释性和语义漂移的影响,表明它提高了构造解释的质量,从而提高了下游推理性能。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
The open-radio access network (O-RAN) embraces cloudification and network function virtualization for base-band function processing by dis-aggregated radio units (RUs), distributed units (DUs), and centralized units (CUs). These enable the cloud-RAN vision in full, where multiple mobile network operators (MNOs) can install their proprietary or open RUs, but lease on-demand computational resources for DU-CU functions from commonly available open-clouds via open x-haul interfaces. In this paper, we propose and compare the performances of min-max fairness and Vickrey-Clarke-Groves (VCG) auction-based x-haul and DU-CU resource allocation mechanisms to create a multi-tenant O-RAN ecosystem that is sustainable for small, medium, and large MNOs. The min-max fair approach minimizes the maximum OPEX of RUs through cost-sharing proportional to their demands, whereas the VCG auction-based approach minimizes the total OPEX for all resources utilized while extracting truthful demands from RUs. We consider time-wavelength division multiplexed (TWDM) passive optical network (PON)-based x-haul interfaces where PON virtualization technique is used to flexibly provide optical connections among RUs and edge-clouds at macro-cell RU locations as well as open-clouds at the central office locations. Moreover, we design efficient heuristics that yield significantly better economic efficiency and network resource utilization than conventional greedy resource allocation algorithms and reinforcement learning-based algorithms.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
translated by 谷歌翻译
Profile extrusion is a continuous production process for manufacturing plastic profiles from molten polymer. Especially interesting is the design of the die, through which the melt is pressed to attain the desired shape. However, due to an inhomogeneous velocity distribution at the die exit or residual stresses inside the extrudate, the final shape of the manufactured part often deviates from the desired one. To avoid these deviations, the shape of the die can be computationally optimized, which has already been investigated in the literature using classical optimization approaches. A new approach in the field of shape optimization is the utilization of Reinforcement Learning (RL) as a learning-based optimization algorithm. RL is based on trial-and-error interactions of an agent with an environment. For each action, the agent is rewarded and informed about the subsequent state of the environment. While not necessarily superior to classical, e.g., gradient-based or evolutionary, optimization algorithms for one single problem, RL techniques are expected to perform especially well when similar optimization tasks are repeated since the agent learns a more general strategy for generating optimal shapes instead of concentrating on just one single problem. In this work, we investigate this approach by applying it to two 2D test cases. The flow-channel geometry can be modified by the RL agent using so-called Free-Form Deformation, a method where the computational mesh is embedded into a transformation spline, which is then manipulated based on the control-point positions. In particular, we investigate the impact of utilizing different agents on the training progress and the potential of wall time saving by utilizing multiple environments during training.
translated by 谷歌翻译